Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics

نویسندگان

  • Uri Keich
  • Attila Kertesz-Farkas
  • William Stafford Noble
چکیده

Interpreting the potentially vast number of hypotheses generated by a shotgun proteomics experiment requires a valid and accurate procedure for assigning statistical confidence estimates to identified tandem mass spectra. Despite the crucial role such procedures play in most high-throughput proteomics experiments, the scientific literature has not reached a consensus about the best confidence estimation methodology. In this work, we evaluate, using theoretical and empirical analysis, four previously proposed protocols for estimating the false discovery rate (FDR) associated with a set of identified tandem mass spectra: two variants of the target-decoy competition protocol (TDC) of Elias and Gygi and two variants of the separate target-decoy search protocol of Käll et al. Our analysis reveals significant biases in the two separate target-decoy search protocols. Moreover, the one TDC protocol that provides an unbiased FDR estimate among the target PSMs does so at the cost of forfeiting a random subset of high-scoring spectrum identifications. We therefore propose the mix-max procedure to provide unbiased, accurate FDR estimates in the presence of well-calibrated scores. The method avoids biases associated with the two separate target-decoy search protocols and also avoids the propensity for target-decoy competition to discard a random subset of high-scoring target identifications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Importance of Well-Calibrated Scores for Identifying Shotgun Proteomics Spectra

Identifying the peptide responsible for generating an observed fragmentation spectrum requires scoring a collection of candidate peptides and then identifying the peptide that achieves the highest score. However, analysis of a large collection of such spectra requires that the score assigned to one spectrum be well-calibrated with respect to the scores assigned to other spectra. In this work, w...

متن کامل

ProteoStats - a library for estimating false discovery rates in proteomics pipelines

SUMMARY Statistical validation of peptide assignments from a large-scale shotgun proteomics experiment is a critical step, and various methods for evaluating significance based on decoy database search are in practice. False discovery rate (FDR) estimation of peptide assignments assesses global significance and corrects for multiple comparisons. Various approaches have been proposed for FDR est...

متن کامل

How to talk about protein‐level false discovery rates in shotgun proteomics

A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein-level FDRs a difficult and vague concept, as the measurement entities, ...

متن کامل

Decoy-free protein-level false discovery rate estimation

MOTIVATION Statistical validation of protein identifications is an important issue in shotgun proteomics. The false discovery rate (FDR) is a powerful statistical tool for evaluating the protein identification result. Several research efforts have been made for FDR estimation at the protein level. However, there are still certain drawbacks in the existing FDR estimation methods based on the tar...

متن کامل

Transferred subgroup false discovery rate for rare post-translational modifications detected by mass spectrometry.

In shotgun proteomics, high-throughput mass spectrometry experiments and the subsequent data analysis produce thousands to millions of hypothetical peptide identifications. The common way to estimate the false discovery rate (FDR) of peptide identifications is the target-decoy database search strategy, which is efficient and accurate for large datasets. However, the legitimacy of the target-dec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2015